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Abstract. Recent years have seen an increasing need of high-level spec- 
A\ '" ification languages and tools generating code from specifications. In this 

^O . paper, we introduce a specification language, SL, which is tailored to 

the writing of syntactic theories of language semantics. More specifi- 
cally, the language supports specifying primitive notions such as dynamic 
constraints, contexts, axioms, and inference rules. We also introduce a 
system which generates interpreters from SL specifications. A prototype 

l_J | system is implemented and has been tested on a number of examples, 

CLj ■ including a syntactic theory for Verilog. 
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1 Introduction 



Syntactic theories have been developed to reason about many aspects of modern 
programming languages [AB97, AFM+95, LS97, SS99, FLS99]. Having roots in 
the A-calculus, these theories rely on transforming source programs to other 
Q\ • source programs. Only the syntax of the programming language is relevant. 

Experience shows that the development of such theories is error-prone. In or- 
der to ensure that the theories are sensible, many properties need to be checked. 
For example, we need to know if we have enough rules to rewrite a program to 



its value, if the type of a program is preserved during evaluation, and whether 
each program has a unique value. In many situations, the proofs of these prop- 
erties do not require deep insight. In fact, many purported proofs suffer from 
being incomplete, and usually the missed case is the problematic one. Thus, we 
think that in order to rely on syntactic theories, it is of mandatory importance 
to design tools that support their development. The work described in this paper 
offers a first step towards that direction. 

We introduce the specification language SL, which can directly reflect the 
primitive notions of syntactic theories such as evaluation contexts and dynamic 
constraints. An experimental system has been implemented that generates inter- 
preters from SL specifications. Currently the generated interpreters are programs 
in CAML[Cam] which is a dialect in the ML family. Various examples have been 
tested, including the operational semantics of core-ML (lambda-calculus with 
built-in operations, store operations, and exception handling), type inference 
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for core-ML, a syntactic theory for a store encapsulation language[SS99], and a 
syntactic theory for Verilog[FLS99]. 

The paper is organized as follows: Section 2 gives an overview of the SL 
system using the call-by- value A-calculus as an example. Section 3 describes some 
issues in compiling SL programs, such as type checking for contexts, pattern- 
matching, and code generation. Section 4 presents related work and concludes 
the paper. 



2 Overview of the SL System 

We introduce the call- by- value A-calculus [Plo75] and show how it is specified in 
the SL language. By convention, we call the SL language the meta-language, 
and call the call-by-value A-calculus the object-language. 



2.1 A Syntactic Theory for Call- by- value A-calculus 

The set of terms of the call-by-value lambda calculus is generated inductively 
over an infinite set of variables (ranged over by x, y, etc): it includes A-abstractions 
and applications: 

Terms M ::= x | Xx.M \ MM 

The semantics is based on the f3 v reduction rule which requires a syntactic 
definition of the notion of value: 

Values V ::= Xx.M 

(3 V {\x.M)V -> M[x := V] 

where M[x := V] is the term resulting from substituting free occurrences of 
variable x with V. 

A call-by-value computation consists of successively applying the (3 V reduc- 
tion rule to a subterm. Positions of (3 V redexes are restricted by an evaluation 
context which is defined as follows: 

H ::=a\ HM \ VH 

where □ represents a "hole". If H is an evaluation context, then H[M] denotes 
the term that results from placing M in the hole of H . The evaluation of a 
program is then defined by a stepping relation, denoted by i— >, given as follows: 

M -> M' 



H[M] >-> H[M>] 



SIGNATURE: 

type M = Var of string I Lam of string*M I App of M*M; ; 

startfrom M; ; 

SPECIFICATION: 
#open "namesupply" ; ; 
let rec subst (tl,x,t2) = 
match tl with 

Var s -> if s = x then t2 else tl 
I Lam(s,tl') -> if s = x then tl 
else let s' = freshnameO in 

Lam(s', subst (subst (tl ' ,s ,Var s'),x,t2)) 
I App(tll,tl2) -> App(subst(tll,x,t2) , subst (tl2,x,t2) ) ; ; 

dynamic V = Lam _;; 

axiom betav: App(Lam(x,tl) , (t2:V)) ==> subst(tl, x, t2) ; ; 

context H = BOX I App(H,_) I App(V,H) ; ; 

inference eval: 
tl ==> t2 



(h:H) tl |==> h t2 



Fig. 1. An SL Specification of a Simple CBV Language 



2.2 Representation in SL 

The specification of the call-by- value A-calculus in SL is given in Fig. 1. The 
SIGNATURE part describes the abstract syntax of the language using Caml type 
definitions. In general, the SL type definitions may be polymorphic but are 
restricted to first-order type expressions (no function types). To account for 
cases in which the description needs more than one type, the type of programs 
in the object language is explicitly given by the startfrom phrase. 

The SPECIFICATION part describes the semantics of the language. A dynamic 
definition defines a subset of a type with a semantic significance. The axioms 
are conditional rewriting rules. The optional conditions are Caml expressions 
following the keyword when. The meta-language also has a primitive notion of 
contexts with BOX as the empty context. Each inference rule has one premise 
clause and one conclusion clause, also with an optional condition expression. 
Axioms and inference rules use a richer notion of pattern-matching than the 
one used in most functional languages: they include dynamic constraints like 
in t2:V, context constraints and context fillings like in h:H and h t2. Meta- 



operations of the semantics like substitution are written directly in Caml as 
auxiliary definitions. 

2.3 Generating interpreters 

The SL system is very domain-specific, targeting exactly the kind of semantic 
specifications based on syntactic theories. In addition, it performs basic checks to 
ensure the specifications are well-formed. Other than the basic syntactic checks, 
the SL system has a (meta-)type system that ensures that contexts are used 
appropriately, e.g., every context has one hole, and contexts are filled with ex- 
pressions of the appropriate types, and both sides of each axiom have the same 
type. After performing these basic checks, the SL system compiles the specifica- 
tion into a non-deterministic automaton, which is then transliterated into Caml 
code; the code uses success continuations for encoding the sequencing of states; 
and exceptions with handlers for encoding the non-deterministic selection of a 
state. 

Feeding the code in Fig. 1 to the SL system produces an interpreter. This 
interpreter can then be invoked on terms of the language to evaluate them by 
repeatedly decomposing them into evaluation contexts and redexes, and con- 
tracting the redexes, until an answer is reached. For example, if the input file 
contains: 



App(Lam("y",Var "y") , App(Lam("x" ,Var "x") ,Lam("z" ,Var "z")));; 



the generated interpreter produces: 

App(Lam("y",Var "y") , App(Lam ("x",Var "x") ,Lam("z" ,Var "z"))) 

==> by betav,eval 
App(Lam("y",Var "y") ,Lam("z" , Var "z")) 

==> by betav,eval 
Lam("z",Var "z") 



Interpreters generated by the SL system preserve the semantics of object- 
languages in the sense that if a specification is non-deterministic, the generated 
interpreter evaluates input programs non-deterministically. The current version 
does not employ backtracking in evaluation. 



3 Compiling SL specifications 

The compilation of an SL specification includes the usual phases such as lexing, 
parsing, static checking, and code generation. For an object-language specified 



by an SL program, a parser and a pretty-printer of the object-language are 
generated from the signature part, and a reduction machine based on pattern- 
matching automata is generated from the semantic rules. These parts work to- 
gether as an interpreter for the object-language with the support of the SL 
runtime libraries. 

Next, we introduce some issues in the SL compilation such as typing contexts, 
building automata, and transforming automata into Caml code. 



3.1 Typing contexts 

The type system for SL extends the type system for Caml. The extensions deal 
with dynamic definitions and contexts. Here, we only present the idea of typing 
contexts in a simply-typed framework. Typing dynamic definitions is similar. 

First, we give definitions of meta-expressions, context expressions, and their 
types. 



Expressions E 

Context Expressions H 

Context Definitions L 

Types T 

Context Types U 



= c | Cl E \(E,E)\x\ Xx.E \ E E \ N[E] 
= n\N\ Cl H\ (H,E) | (E,H) \N[H] 
= N = H I ••• \ H 
= a | T*T \T -^T 
= To->T 



We write x for variables, cq for miliary constructors, c\ for constructors of 
arity one, a for constant types, and N for context names. Note that the symbol 
"=" and the symbol " I " in a context definition are symbols of the SL language. 
The expressions include context filling, and tuple expressions are represented as 
nested pairs. Context expressions are distinguished from expressions, for they 
always contains one hole. The type of a context expression has the form t\o— ► t 2 , 
where t\ is the type for the hole and £2 is the type of the whole expression if the 
hole is filled. For a context definition N - H\ \ ■ ■ ■ I H n , each Hi should have 
the same type as the context type of N. 

The typing rules are given in Table 1. T is a basis for type checking, which 
contains type assignments for constructors, variables, and context names. It has 
properties such as weakening where it may have unused assignments, strength- 
ening where unused assignments can be removed, permutation where the order 
of assignments is irrelevant, and contraction where assignments can be used 
more than once. The first part in the table is the set of rules for expressions. 
Most are standard except the rule for context filling which is similar to function 
application. The second part is the set of rules for context expressions. These 
rules express the "lifting" of the context type constructor o-> whenever possible, 
so that context expressions preserve context types. The rule for filling contexts 
with context expressions is similar to function composition. If a context N has 
type T\o^t T2 and a context expression h has type tqo—>- n, then the context 
expression N[h] has type tqg-^ T2- 
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Table 1. The type checking rules for SL expressions and context expressions 



3.2 Building pattern-matching automata 

Pattern-matching is the crucial part in compiling an SL specification. A naive 
way is to check a list of patterns one by one. The obvious drawback of this 
approach is inefficiency. Tree-like automata [GS84] address the efficiency issue. 
The matching proceeds by making branches of different constructors and as- 
cribing the list of patterns to those branches. This approach has the disad- 
vantage of space explosion. The combination of tree automata with failures 
is the basis for the current implementations of most common functional lan- 
guages [Mar94, Ler90]. The pattern-matching of the SL system follows this ap- 
proach, but the support of semantic notions and the non-determinism of rewrit- 
ing require extensions to the existing algorithms. 

1. SL Patterns 

The SL patterns include common patterns such as wildcard patterns, vari- 
able patterns, alternative patterns, type constraint patterns, alias patterns, 
and tuple patterns. The SL patterns are also enriched with dynamic con- 
straint patterns and context filling patterns. The dynamic constraint (p : 



dynamicjname) and context filling (p : context-name) p 2 require p to be a 
wildcard pattern or a variable pattern. The SL patterns are formally denned 
as follows: 

Patterns P ::= _ | x | c | CiP \ P I P \ P as x 

(P:typer) | (P, . . . , P) \ 
(P : dynamicjname) \ 
{P : context jname) P 

Axioms A ::= P when E ==> E 

E==>Q 



Inference Rules I ::= 

1 P when E | ==> E 

Dynamic Definitions D ::= P 

Context Definitions H(x) ::= P 

where E denotes Caml expressions, and Q denotes restricted SL patterns 
which do not contain dynamic constraints and context fillings. 
The SL patterns are used in the left-hand sides of axioms and of conclusion 
clauses of inference rules. Dynamic definitions can be considered as defini- 
tions of SL patterns, and context definitions are parametric patterns where 
the parameters denote the holes. Both definitions can be recursively defined. 
The dynamic definition, context definition, and rules in Fig. 1 can be repre- 
sented in terms of SL patterns as follows: 

V = Lam _ 

V : App(Lam(x,M), (v : V))==>M[x := v] 
H{x) - x | App((/ii : H)(x), _) | App(( W : V), (h 2 : H)(x)) 

h ==> t 2 



eval 



(h : H) £1 |==> ht 2 



where variables are introduced for dynamic constraint and context filling 
patterns. 
Automata 

An automaton consists of states. Some states are final. Matching a term con- 
sists of traversing the states until reaching a final one. States are inductively 
defined as follows: 

States S ::= branch (£, (test, S), . . . , (test, S)) | accept E 
S I • • • I S | fail | let vars = f E in S | 
if E then S else S \ let vars = vars in S 

where vars denotes a variable or a tuple of variables. 

The first four states are standard, branch (t, (test\, Si), . . . , (test n , S n )) is 
a branch-test state, where t is the term under test, and each testi has the 
form Co, c\ x, or "_" for otherwise. All tests are mutually disjoint, accept e 
is a final state, where e is a Caml expression representing the action af- 
ter acceptance. A choice state S\\ ■ ■ ■ \S n has alternatives Si, . . . , S n . When 
pattern-matching traverses this state, it non-deterministically chooses to en- 
ter an alternative. If a final state is reached, then traversing the choice state 



is successful. Otherwise it backtracks to other alternatives. This semantics 
differs from the usual choice state whose alternatives are ordered (lexically) . 
fail is a failure state. One new form, reference state let vars = f e in Si, is 
added to support dynamic values and contexts. It calls a function matching 
the parameter as the corresponding dynamic value, context, or redex . If it 
succeeds, it continues in state Si, failures in Si may cause backtracking to 
other possibilities in the function call / e. 

States are also extended with conditional expressions and let variable bind- 
ings. The reason for the former extension is that the semantic rules are 
conditional. The latter extension is helpful for code generation. States are 
annotated with terms to be matched, but we made them implicit in our 
presentation. 

Structures for pattern-matching 

The SL compiler collects the inference rules and axioms with the same type 
together. The patterns of the rules form a vector which can be considered as a 
one-column matrix. Each rule contributes a row in the matrix. We introduce 
parameters bound to the terms matching the patterns, and we keep track 
of variable bindings in pattern-matching. There is also a state for each rule, 
indicating what to do when the patterns of the rule have been matched. The 
whole pattern-matching structure is represented as follows: 

( h ■■■ t n \ 

Pll---Pln Si 

\Pml' ' 'Pmn S m J 

The compilation of pattern-matching can be regarded as a function, C, which 

maps such a structure to a state. 

The initialization of the pattern-matching sets the states in the structure. 

— For an axiom p\ when e c ==> e r , the corresponding state is: 

if e c then accept e r else fail 

— For an inference rule , the corresponding state is: 

P2 when e c ==> e2 

if e c then let p\ = rewritel e\ in accept e2 else fail 

where rewritel is one-step rewriting function in the generated code. 
4. Pattern-matching algorithm 

The pattern-matching algorithm is a divide-and-conquer algorithm. It selects 
one column of the pattern matrix to work on according to certain criteria. 
Without loss of generality, we assume that the algorithm always chooses the 
first column. The algorithm repeats the following steps until the pattern 
matrix is empty: 



(a) Preprocessing 

This step canonicalizes the patterns in the first column. It removes the 
type constraints since the type information is not useful at the current 
stage. It binds variables in alias patterns to the corresponding parame- 
ters. It turns each alternative pattern into several rows having one al- 
ternative each. Formally, the preprocessing repeats the following simpli- 
fications. 
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(b) Splitting the matrix 

The algorithm splits the matrix horizontally, so that in each submatrix, 
all first-column patterns arc in one of the following groups: 

— variable group: wildcard patterns or variable patterns, 

— tuple group: tuple patterns, 

— constructor group: constructor patterns, 

— dynamic constraint group: dynamic constraint patterns on the same 
dynamic definition, 

— context filling group: context filling patterns on the same context 
definition. 

The result of the splitting is a choice state. 
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(c) Analyzing different cases 

For each submatrix, the compilation function C is inductively defined as 
follows: 



i. Base case: 

When the pattern matrix is empty, pattern-matching is vacuously 
successful. The function C creates a choice state. 
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ii. Variable group: 

Wildcard and variable patterns always match successfully. The func- 
tion C removes the column of patterns. For variable patterns, bind- 
ings are added for further access to the variables. 
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iii. Tuple group: 

The function C treats each component of the tuple as an individual 
pattern. It replaces the first column of patterns with columns of the 
component patterns, and introduce parameters for the components. 
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Constructor group: 

The function C collects the rows which have the same constructor in 
the first column into a group of new pattern-matching structures, and 
it creates a branch-test state. The tests of the state are distinguished 
by having different constructors as roots. The corresponding actions 
for the tests are the results of compiling the new structures. In the 
new structures, the first column is removed for the constructors of 
zero arity, or it is replaced by the column of argument patterns for 
the constructors of non-zero arity. 
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Dynamic constraint group: 

The function C creates a reference state. The reference will initiate 
pattern-matching for the dynamic definition D with the value of t\ . 
Its result is bound to a new parameter. The state in the let body is 
the result of compiling a structure consisting of the patterns without 
the constraint and the rest of the patterns. 
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Pattern-matching for dynamic definitions will be presented later. 
Context filling group 

Similar to the dynamic constraint group, the function C creates a 
reference state. The reference will initiate pattern-matching for the 
context definition H with the value of t\. Its result is bound to a 
pair of parameters which represent the context and the correspond- 
ing hole occurring in t\. The state of the let body is the result of 
compiling a structure consisting of the context patterns, the hole 
patterns, as well as the rest of the patterns. 
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Pattern matching for context definitions will be presented next. 



5. Matching dynamic definitions and context definitions 

Pattern-matching for dynamic definitions and context definitions uses the 
same form of structures and uses the same algorithm. Each definition cor- 
responds to a structure which has only one pattern, one parameter and one 
state. The pattern comes from the definition. Assume £ is the term to be 
matched, the state is initialized as follows: 

— For a dynamic definition, the state is accept £. 

— For a context definition, the state is accept (Ax .body , x) , where x is the 
parameter for the context and the body is a placeholder. When the base 
case in the algorithm is encountered, the SL compiler sets the contents 
of body by reconstructing the term £ with the variable x. Each body may 
have different content if the final state is copied. The reconstruction 
retrieves the bindings along the path from the start state to the final 
state. 

In other words, matching a context definition returns a pair with a con- 
structing function and a term filling the hole. Applying the function to 
the term results in the term £. 
The state starting pattern-matching for a definition can be referred to by the 
name of the definition. For example, the dynamic definition and the context 
definition in Fig. 1 are associated with the following states: 

match_D t = branch(£, (Lam £', accept (£))) 

match_ff £ = let x = £ in accept (Ax. bodyi, x) I 
branch(£, 

(App £', let (ti,fr) = £' in 

(let (t[,t'{) = match_# ti in 
let h\ = t[ in 

let x = t'{ in a.ccept(Xx.body2,x)) I 
(let t'i = match_D t\ in let v = t[ in 
let (t' 2 , t'2) = match_ff £2 in 
let hi — t' 2 in 

let x = t' 2 ' in accept ( Ax. body%, x)))) 

where bodyi, body 2 and body 3 are as follows: 

bodyi : let t = x in t 

body2 : let t" = x in let t[ = hi in let t\ = t\ t'{ in 

let t' = (ti,t 2 ) in let t = App t' in t 
body^ : let t 2 = x in let t 2 = ?l2 in let £2 = t 2 t 2 ' in 

let t\ = v in let t\ = t\ in let £' = (£i,£2) in 
let £ = App £' in £ 

3.3 Transforming automata into Caml code 

The transformation of states in the pattern-matching automata into Caml code 
is described in Fig. 2. Each state corresponds to a function with implicit param- 
eters for the terms to be matched and a continuation expressing the remaining 



pattern-matching work. The initial continuation for rewriting is the identity 
function. State functions may raise an exception when matching of the state 
fails. The branch-test state corresponds to the match • • • with • • • construct in 
Caml. If the branch tests do not cover all constructors of the same type, a de- 
fault test associated with a failure state is added. For a final state, the success 
continuation is consumed by applying it to the expression. For a choice state, 
a random alternative is tried. Failure exceptions may be caught and then other 
alternatives can be tried. The failure state just raises a failure exception. For a 
reference state, it calls the matching function with the continuation accepting 
the result, then it continues with the state in the body of the let. The con- 
ditional expressions and variable bindings in states are considered atomic with 
respect to continuation passing and exception handling. Transforming a condi- 
tional state is thus transforming both branch states, and transforming a let 
state is transforming the in part. 



[ branch (t, (co, So), . . . , (ci x, Si)) ] k = 
match t with 

co -> [S ] k 

a x -> I Si ] k 

-> raise failure 
[ accept e ] k — k (e) 
| Si I • • • I S m ] k = 

try [Si] k 0<i<m 

with failure -> 

[Sil---ISi-i I S i+ il---|S m ] k 
[ fail ] k — raise failure 
[ let vars — f e in S ] k — 

f e (At. let vars — t in [ S ] k) 
[ if e c then Si else S2 ] k — 

if e c then [ Si ] k else [ S2 ] k 
I let varsi — vars2 in S ] k — 

let varsi = vars2 in [ S ] k 



Fig. 2. Generating Caml code 



4 Summary and Discussion 



The SL system uses the first-order types of functional languages for specify- 
ing abstract syntax. We have not followed the approach of higher- order abstract 
syntax (HO AS) [PE88]. The HOAS representation would allow the variables of 
the object language to be represented as meta-variables, and hence alleviates 



the need for explicitly reasoning about variable renaming and substitution of 
variables. However, higher-order abstract syntax interacts poorly with the in- 
ductive reasoning techniques needed to reason about the properties of semantic 
specifications [DPS97]. 

The SL system uses conditional rewriting rules for specifying semantic rules. 
In other words, the system employs rewriting semantics (a.k.a. reduction seman- 
tics). Some systems [HM92, CDD+88] express rules in natural semantics[CDD + 85]. 
Natural semantics is well adapted to describing static behaviors such as typing. 
For transformational behaviors, such as dynamic semantics, rewriting seman- 
tics proves to be more modular [WF94], and should therefore be more tractable 
when it comes to expressing full-size programming languages. Another advan- 
tage of rewriting semantics is that it allows one to observe intermediate states 
of reductions, so that it is more suitable for non-terminating object-systems. 

ELAN[ELA] is a system also based on first-order rewriting semantics. It has 
more general application areas than the SL system. It supports many-to-one 
associative-commutative(AC) patterns and provides an efficient algorithm [MK98]. 
Its strategy language gives flexible control over non-deterministic reductions. The 
Stratego[VB98] system has more generic strategy specifications. There are also 
general-purpose tools that can be used for manipulating formal semantics, such 
as Coq[Coq], Isabelleflsa] and Twelf [Twe] . Compared to those systems, the nov- 
elty of the SL system is that it directly supports the specification of the semantic 
notions such as dynamic values and evaluation contexts, and it automatically 
generates executable interpreters. Associative-commutative patterns can be rep- 
resented by contexts, but the current SL system docs not optimize matching for 
efficiency. 

Fahndrich and Boyland [FB97] investigate abstract patterns which are even 
more general than contexts in the SL system. They build automata for pat- 
terns and check the properties of the patterns such as exhaustiveness and non- 
overlapping, but they did not provide a computational view of pattern-matching. 

The current prototype of the SL system constitutes a first step towards 
designing a specification language for syntactic theories and implementing a sys- 
tem generating interpreters from specifications. It also provides an extension 
to pattern-matching techniques. A number of interesting examples have been 
tested in the prototype system. Follow-up work is underway to allow more ex- 
pressive specifications, to optimize the compilation of the SL language, and to 
automatically prove properties of syntactic theories. 
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